AITopics

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China (0.04)

Technology: Information Technology > Artificial Intelligence (0.96)

Neural Information Processing SystemsFeb-10-2026, 21:38:22 GMT

9622163c87b67fd5a4a0ec3247cf356e-Paper-Conference.pdf

beam search, query, sequence, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Orange County > Irvine (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.68)

Industry:

Information Technology (0.67)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Neural Information Processing SystemsFeb-10-2026, 16:17:34 GMT

d96409bf894217686ba124d7356686c9-AuthorFeedback.pdf

marginal likelihood estimate, reviewer, revision, (11 more...)

Industry: Health & Medicine (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Wu, Shiguang, Yao, Quanming

Asking LLMs to Verify First is Almost Free Lunch

arXiv.org Artificial IntelligenceDec-1-2025

To enhance the reasoning capabilities of Large Language Models (LLMs) without high costs of training, nor extensive test-time sampling, we introduce Verification-First (VF), a strategy that prompts models to verify a provided candidate answer, even a trivial or random one, before generating a solution. This approach triggers a "reverse reasoning" process that is cognitively easier and complementary to standard forward Chain-of-Thought (CoT), effectively invoking the model's critical thinking to reduce logical errors. We further generalize the VF strategy to Iter-VF, a sequential test-time scaling (TTS) method that iteratively cycles the verification-generation process using the model's previous answer. Extensive experiments across various benchmarks (from mathematical reasoning to coding and agentic tasks) and various LLMs (from open-source 1B to cutting-edge commercial ones) confirm that VF with random answer consistently outperforms standard CoT with minimal computational overhead, and Iter-VF outperforms existing TTS strategies.

artificial intelligence, large language model, natural language, (16 more...)

2511.21734

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Neural Information Processing SystemsAug-17-2025, 03:13:26 GMT

9622163c87b67fd5a4a0ec3247cf356e-Supplemental-Conference.pdf

artificial intelligence, machine learning, natural language, (19 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Asia > China > Hong Kong (0.04)

Genre: Instructional Material (0.94)

Industry:

Education > Educational Setting > Online (0.95)
Education > Educational Technology > Educational Software > Computer Based Training (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsAug-17-2025, 03:13:22 GMT

9622163c87b67fd5a4a0ec3247cf356e-Paper-Conference.pdf

data mining, machine learning, natural language, (20 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Orange County > Irvine (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.68)

Industry:

Information Technology (0.67)
Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Neural Information Processing SystemsAug-16-2025, 17:43:20 GMT

d96409bf894217686ba124d7356686c9-AuthorFeedback.pdf

artificial intelligence, machine learning, realnvp, (13 more...)

Industry: Health & Medicine (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJun-13-2024

Reinforced Compressive Neural Architecture Search for Versatile Adversarial Robustness

Wang, Dingrong, Sapkota, Hitesh, Tao, Zhiqiang, Yu, Qi

Prior neural architecture search (NAS) for adversarial robustness works have discovered that a lightweight and adversarially robust neural network architecture could exist in a non-robust large teacher network, generally disclosed by heuristic rules through statistical analysis and neural architecture search, generally disclosed by heuristic rules from neural architecture search. However, heuristic methods cannot uniformly handle different adversarial attacks and "teacher" network capacity. To solve this challenge, we propose a Reinforced Compressive Neural Architecture Search (RC-NAS) for Versatile Adversarial Robustness. Specifically, we define task settings that compose datasets, adversarial attacks, and teacher network information. Given diverse tasks, we conduct a novel dual-level training paradigm that consists of a meta-training and a fine-tuning phase to effectively expose the RL agent to diverse attack scenarios (in meta-training), and making it adapt quickly to locate a sub-network (in fine-tuning) for any previously unseen scenarios. Experiments show that our framework could achieve adaptive compression towards different initial teacher networks, datasets, and adversarial attacks, resulting in more lightweight and adversarially robust architectures.

architecture, budget, teacher network, (14 more...)

2406.06792

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
North America > United States > New York > Monroe County > Rochester (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Bui, Minh Duc, Schmidt, Fabian David, Glavaš, Goran, von der Wense, Katharina

Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget

arXiv.org Artificial IntelligenceApr-30-2024

Compared to standard language model (LM) pretraining (i.e., from scratch), Knowledge Distillation (KD) entails an additional forward pass through a teacher model that is typically substantially larger than the target student model. As such, KD in LM pretraining materially slows down throughput of pretraining instances vis-a-vis pretraining from scratch. Scaling laws of LM pretraining suggest that smaller models can close the gap to larger counterparts if trained on more data (i.e., processing more tokens)-and under a fixed computation budget, smaller models are able be process more data than larger models. We thus hypothesize that KD might, in fact, be suboptimal to pretraining from scratch for obtaining smaller LMs, when appropriately accounting for the compute budget. To test this, we compare pretraining from scratch against several KD strategies for masked language modeling (MLM) in a fair experimental setup, with respect to amount of computation as well as pretraining data. Downstream results on GLUE, however, do not confirm our hypothesis: while pretraining from scratch performs comparably to ordinary KD under a fixed computation budget, more sophisticated KD strategies, namely TinyBERT (Jiao et al., 2020) and MiniLM (Wang et al., 2023), outperform it by a notable margin. We further find that KD yields larger gains over pretraining from scratch when the data must be repeated under the fixed computation budget.

budget, computational linguistic, no-kd, (16 more...)

2404.19319

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
North America > Canada > Ontario > Toronto (0.04)
(8 more...)

Genre: Research Report > New Finding (0.94)

Industry: Education > Educational Technology > Educational Software (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceDec-15-2023

GreenFlow: A Computation Allocation Framework for Building Environmentally Sound Recommendation System

Lu, Xingyu, Liu, Zhining, Guan, Yanchu, Zhang, Hongxuan, Zhuang, Chenyi, Ma, Wenqi, Tan, Yize, Gu, Jinjie, Zhang, Guannan

Given the enormous number of users and items, industrial cascade recommendation systems (RS) are continuously expanded in size and complexity to deliver relevant items, such as news, services, and commodities, to the appropriate users. In a real-world scenario with hundreds of thousands requests per second, significant computation is required to infer personalized results for each request, resulting in a massive energy consumption and carbon emission that raises concern. This paper proposes GreenFlow, a practical computation allocation framework for RS, that considers both accuracy and carbon emission during inference. For each stage (e.g., recall, pre-ranking, ranking, etc.) of a cascade RS, when a user triggers a request, we define two actions that determine the computation: (1) the trained instances of models with different computational complexity; and (2) the number of items to be inferred in the stage. We refer to the combinations of actions in all stages as action chains. A reward score is estimated for each action chain, followed by dynamic primal-dual optimization considering both the reward and computation budget. Extensive experiments verify the effectiveness of the framework, reducing computation consumption by 41% in an industrial mobile application while maintaining commercial revenue. Moreover, the proposed framework saves approximately 5000kWh of electricity and reduces 3 tons of carbon emissions per day.

action chain, computation, greenflow, (15 more...)